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Abstract 

An order-revealing encryption scheme gives a public procedure by which two ciphertexts 
can be compared to reveal the ordering of their underlying plaintexts. We show how to use 
order-revealing encryption to separate computationally efficient PAC learning from efficient 
(e, ^-differentially private PAC learning. That is, we construct a concept class that is efficiently 
PAC learnable, but for which every efficient learner fails to be differentially private. This answers 
a question of Kasiviswanathan et al. (FOCS ’08, SIAM J. Comput. ’ll). 

To prove our result, we give a generic transformation from an order-revealing encryption 
scheme into one with strongly correct comparison, which enables the consistent comparison 
of ciphertexts that are not obtained as the valid encryption of any message. We believe this 
construction may be of independent interest. 
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1 Introduction 


Many agencies hold sensitive information about individuals, where statistical analysis of this data 
could yield great societal benefit. The line of work on differential privacy [DMNS06] aims to enable 
such analysis while giving a strong formal guarantee on the privacy afforded to individuals. Noting 
that the framework of computational learning theory captures many of these statistical tasks, 
Kasiviswanathan et al. [KLN + 11] initiated the study of differentially private learning. Roughly 
speaking, a differentially private learner is required to output a classification of labeled examples 
that is accurate, but does not change significantly based on the presence or absence of any individual 
example. 

The early positive results in private learning established that, ignoring computational complex¬ 
ity, any concept class is privately learnable with a number of samples logarithmic in the size of the 
concept class [KLN + 11], Since then, a number of works have improved our understanding of the 
sample complexity - the minimum number of examples - required by such learners to simultane¬ 
ously achieve accuracy and privacy. Some of these works showed that privacy incurs an inherent 
additional cost in sample complexity; that is, some concept classes require more samples to learn pri¬ 
vately than they require to learn without privacy [BKN10, CH11, BNS13, FX14, CHS14, BNSV15]. 
In this work, we address the complementary question of whether there is also a computational price 
of differential privacy for learning tasks, for which much less is known. The initial work of Ka¬ 
siviswanathan et al. [KLN + 11] identified the important question of whether any efficiently PAC 
learnable concept class is also efficiently privately learnable, but only limited progress has been 
made on this question since then [BKN10, Nisl4]. 

Our main result gives a strong negative answer to this question. We exhibit a concept class 
that is efficiently PAC learnable, but under plausible cryptographic assumptions cannot be learned 
efficiently and privately. To prove this result, we establish a connection between private learning 
and order-revealing encryption. We construct a new order-revealing encryption scheme with strong 
correctness properties that may be of independent learning-theoretic and cryptographic interest. 

1.1 Differential Privacy and Private Learning 

We first recall Valiant’s (distribution-free) PAC model for learning [Val84]. Let C be a concept 
class consisting of concepts c : X —> {0,1} for a data universe X. A learner L is given n samples 
of the form (aq,c(xj)) where the xfs are drawn i.i.d. from an unknown distribution, and are 
labeled according to an unknown concept c. The goal of the learner is to output a hypothesis 
h : X —> {0,1} from a hypothesis class H. that approximates c well on the unknown distribution. 
That is, the probability that h disagrees with c on a fresh example from the unknown distribution 
should be small - say, less than 0.05. The hypothesis class R may be different from C, but in the 
case where H C C we call L a proper learner. Moreover, we say a learner is efficient if it runs in 
time polynomial in the description size of c and the size of its examples. 

Kasiviswanathan et al. [KLN + 11] defined a private learner to be a PAC learner that is also 
differentially private. Two samples S = {(aq, &i),..., ( x n , b n )} and S' = {(a^, b \),..., ( x ' n , b ' n )} are 
said to be neighboring if they differ on exactly one example, which we think of as corresponding to 
one individual’s information. A randomized learner L : (X x {0, l}) n —> H. is (e, 5)-differentially 
private if for all neighboring datasets S and S' and all sets T C R, 

Pr [L(S) € T] < e £ Pr[L(S') £ T\ + 5. 
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The original definition of differential privacy [DMNS06] took <5 = 0, a case which is called pure 
differential privacy. The definition with positive 5, called approximate differential privacy, first 
appeared in [DKM + 06] and has since been shown to enable substantial accuracy gains. Throughout 
this introduction, we will think of e as a small constant, e.g. e = 0.1, and 5 = o(l/n). 

Kasiviswanathan et al. [KLN + 11] gave a generic “Private Occam’s Razor” algorithm, showing 
that any concept class C can be privately (properly) learned using 0(log|C|) samples. Unfortu¬ 
nately, this algorithm runs in time fi(|C|), which is exponential in the description size of each 
concept. With an eye toward designing efficient private learners, Blum et al. [BDMN05] made the 
powerful observation that any efficient learning algorithm in the statistical queries (SQ) framework 
of Kearns [Kea98] can be efficiently simulated with differential privacy. Moreover, Kasiviswanathan 
et al. [KLN+11] showed that the efficient learner for the concept class of parity functions based 
on Gaussian elimination can also be implemented efficiently with differential privacy. These two 
techniques - SQ learning and Gaussian elimination - are essentially the only methods known for 
computationally efficient PAC learning. The fact that these can both be implemented privately led 
Kasiviswanathan et al. [KLN+11] to ask whether all efficiently learnable concept classes could also 
be efficiently learned with differential privacy. 

Beimel et al. [BKN10] made partial progress toward this question in the special case of pure 
differential privacy with proper learning, showing that the sample complexity of efficient learners 
can be much higher than that of inefficient ones. Specifically, they showed that assuming the 
existence of pseudorandom generators with exponential stretch, there exists for any £(d) = cj(logd) 
a concept class over {0,1 } d for which every efficient proper private learner requires fi(d) samples, but 
an inefficient proper private learner only requires 0(£(d)) examples. Nissim [Nisi4] strengthened 
this result substantially for “representation learning,” where a proper learner is further restricted 
to output a canonical representation of its hypothesis. He showed that, assuming the existence of 
one-way functions, there exists a concept class that is efficiently representation learnable, but not 
efficiently privately representation learnable (even with approximate differential privacy). With 
Nissim’s kind permission, we give the details of this construction in Section 5. 

Despite these negative results for proper learning, one might still have hoped that any efficiently 
learnable concept class could be efficiently improperly learned with privacy. Indeed, a number of 
works have shown that, especially with differential privacy, improper learning can be much more 
powerful than proper learning. For instance, Beimel et al. [BKN10] showed that under pure 
differential privacy, the simple class of Point functions (indicators of a single domain element) 
requires Q(d) samples to privately learn properly, but only O(logd) samples to privately learn 
improperly. Moreover, computational separations are known between proper and improper learning 
even without privacy considerations. Pitt and Valiant [PV88] showed that unless NP = RP, /c-terrn 
DNF are not efficiently properly learnable, but they are efficiently improperly learnable [Val84]. 

Under plausible cryptographic assumptions, we resolve the question of Kasiviswanathan et al. 
[KLN+11] in the negative, even for improper learners. The assumption we need is the existence of 
“strongly correct” order-revealing encryption (ORE) schemes, described in Section 1.3. 

Theorem 1.1 (Informal). Assuming the existence of strongly correct ORE, there exists an ef¬ 
ficiently computable concept class EncThresh that is efficiently PAC learnable, but not efficiently 
learnable by any (e, 5)-differentially private algorithm. 

We stress that this result holds even for improper learners and for the relaxed notion of ap¬ 
proximate differential privacy. We remark that cryptography has played a major role in shap¬ 
ing our understanding of the computational complexity of learning in a number of models (e.g. 
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[Val84, KV94, Kha95, SerOO]). It has also been used before to show separations between what is 
efficiently learnable in different models (e.g. [Blu94, SG04]). 

1.2 Our Techniques 

We give an informal overview of the construction and analysis of the concept class EncThresh. 

We first describe the concept class of thresholds Thresh and its simple PAC learning algorithm. 
Consider the domain [N] = {1,... , IV}. Given a number t 6 [N], a threshold concept c$ is defined 
by ct(x) = 1 if and only if x < t. The concept class of thresholds admits a simple and efficient 
proper PAC learning algorithm Ljhresh- Given a sample {(xi, q(xi)),..., (x n , Q(x n ))} labeled by 
an unknown concept q, the learner Tjh res h identifies the largest positive example Xi * and outputs 
the hypothesis h = c x .„. That is, Ljhresh chooses the threshold concept that minimizes the empirical 
error on its sample. To achieve a small constant error on any underlying distribution on examples, 
it suffices to take n = 0(1) samples. 

A simple but important observation about Tjh res h is that it is completely oblivious to the actual 
numeric values of its examples, or even to the fact that the domain is [ N ]. In fact, ^Thresh works 
equally well on any totally-ordered domain on which it can efficiently compare examples. In an 
extreme case, the learner Tjhresh still works when its examples are encrypted under an order- 
revealing encryption (ORE) scheme, which guarantees that Ljhresh is able to learn the order of 
its examples, but nothing else about them. Up to small technical modifications, our concept class 
EncThresh is exactly the class Thresh where examples are encrypted under an ORE scheme. 

For EncThresh to be efficiently PAC learnable, it must be learnable even under distributions that 
place arbitrary weight on examples corresponding to invalid ciphertexts. To this end, we require a 
“strong correctness” condition on our ORE scheme. The strong correctness condition ensures that 
all ciphertexts, even those that are not obtained as encryptions of messages, can be compared in 
a consistent fashion. This condition is not met by current constructions of ORE, and one of the 
technical contributions of this work is a generic transformation from weakly correct ORE schemes 
to strongly correct ones. 

While a learner similar to Ljh re sh is able to efficiently PAC learn the concept class EncThresh, 
we argue that it cannot do so while preserving differential privacy with respect to its examples. 
Intuitively, the security of the ORE scheme ensures that essentially the only thing a learner for 
EncThresh can do is output a hypothesis that compares an example to one it already has. We make 
this intuition precise by giving an algorithm that traces the hypothesis output by any efficient 
learner back to one of the examples used to produce it. This formalization builds conceptually 
on the connection between differential privacy and traitor-tracing schemes (see Section 1.4), but 
requires new ideas to adapt to the PAC learning model. 

1.3 Order-Revealing Encryption 

Motivated by the task of answering range queries on encrypted databases, an order-revealing en¬ 
cryption (ORE) scheme [BC011, BLR + 15] is a special type of symmetric key encryption scheme 
where it is possible to publicly sort ciphertexts according to the order of the plaintexts. More pre¬ 
cisely, the plaintext space of the scheme is the set of integers [N] = {1,..., IV}, 1 and in addition to 
the private encryption and decryption procedures Enc, Dec, there is a public comparison procedure 
Comp that takes as input two ciphertexts, and reveals the order of the corresponding plaintexts. 

1 More generally, any totally-ordered plaintext space can be considered 
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The notion of best-possible semantic security , defined in Boneh et al. [BLR + 15], intuitively cap¬ 
tures the requirement that, given a collection of ciphertexts, no information about the plaintexts 
is learned, except for the ordering. 

Known constructions of order-revealing encryption. Order-revealing encryption can be 
seen as a special case of 2-input functional encryption. In such a scheme, there are several functions 
/i, ..., fk, and given two ciphertexts cq, c± encrypting m o, mi, it is possible to learn /,(mo, rri\ ) for all 
i E [k]. General multi-input functional encryption schemes can be obtained from indistinguishability 
obfuscation [GGG + 14] or multilinear maps [BLR + 15]. It is also possible to build ORE from single¬ 
input functional encryption with function privacy, which means that / is kept secret. Such schemes 
can be build from regular single-input schemes without function privacy by work of Brakerski and 
Segev [BS15], and such single-input schemes can also be built from obfuscation [GGH + 13b] or 
multilinear maps [GGHZ14]. 

Unfortunately, the above constructions are insufficient for our purposes. The issue arises from 
the fact that our learner needs to work for any distribution on ciphertexts, even distributions 
whose support includes malformed ciphertexts. Unfortunately, previous constructions only achieve 
a weak form of correctness, which guarantees that encrypting two messages and then comparing the 
ciphertexts using Comp produces the same result (with overwhelming probability) as comparing the 
plaintexts directly. This requirement only specifies how Comp works on valid ciphertexts, namely 
actual encryptions of messages. Moreover, correctness is only guaranteed for these messages with 
overwhelming probability, meaning even some valid ciphertexts may cause Comp to misbehave. 

For our learner, this weak form of correctness means, for some distributions that place significant 
weight on bad ciphertexts, the comparison procedure is completely useless, and thus the learner 
will fail for these distributions. 

We therefore need a stronger correctness guarantee. We need that, for any two ciphertexts , the 
comparison procedure is consistent with decrypting the two ciphertexts and comparing the resulting 
plaintexts. This correctness guarantee is meaningful even for improperly generated ciphertexts. 

We note that none of the existing constructions of order-revealing encryption outlined above 
satisfy this stronger notion. For the obfuscation-based schemes, ciphertexts consist of obfuscated 
programs. In these schemes, it is easy to describe invalid ciphertexts where the obfuscated program 
performs incorrectly, causing the comparison procedure to output the wrong result. In the multi¬ 
linear map-based schemes, the underlying instantiation use current “noisy” multilinear maps, such 
as [GGH13a]. An invalid ciphertext could, for example, have too much noise, which will cause the 
comparison procedure to behave unpredictably. 

Obtaining strong correctness. We first argue that, for all existing ORE schemes, the scheme 
can be modified so that Comp is correct for all valid ciphertexts. We then give a generic conversion 
from any ORE scheme with weakly correct comparison, including the tweaked existing schemes, 
into a strongly correct scheme. We simply modify the ciphertext by adding a non-interactive 
zero-knowledge (NIZK) proof that the ciphertext is well-formed, with the common reference string 
added to the public comparison key. Then the decryption and comparison procedures check the 
proof(s), and only output the result (either decryption or comparison) if the proof(s) are valid. 
The (computational) zero-knowledge property of the NIZK implies that the addition of the proof 
to the ciphertext does not affect security. Meanwhile, NIZK soundness implies that any ciphertext 
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accepted by the decryption and comparison procedures must be valid, and the weak correctness 
property of the underlying ORE implies that for valid ciphertexts, decryption and comparison are 
consistent. The result is that comparisons are consistent with decryption for all ciphertexts, giving 
strong correctness. 

As we need strong correctness for every ciphertext, even hard-to-generate ones, we need the 
NIZK proofs to have perfect soundness, as opposed to computational soundness. Such NIZK 
proofs were built in [GOS12], 

We note also that the conversion outlined above is not specific to ORE, and applies more 
generally to functional encryption schemes. 

1.4 Related Work 

Hardness of Private Query Release. One of the most basic and well-studied statistical tasks 
in differential privacy is the problem of releasing answers to counting queries. A counting query 
asks,“what fraction of the records in a dataset D satisfy the predicate qV\ Given a collection of 
k counting queries q\,... ,qk from a family Q, the goal of a query release algorithm is to release 
approximate answers to these queries while preserving differential privacy. A remarkable result of 
Blum et al. [BLR08], with subsequent improvements by [DNR + 09, DRV10, RR10, HR10, GRU12, 
HLM12], showed that an arbitrary sequence of counting queries can be answered accurately with 
differential privacy even when k is exponential in the dataset size n. Unfortunately, all of these 
algorithms that are capable of answering more than n 2 queries are inefficient, running in time 
exponential in the dimensionality of the data. Moreover, several works [DNR + 09, U1113, BZ14] 
have gone on to show that this inefficiency is likely inherent. 

These computational lower bounds for private query release rely on a connection between the 
hardness of private query release and traitor-tracing schemes , which was first observed by Dwork 
et al. [DNR + 09]. Traitor-tracing schemes were introduced by Chor, Fiat, and Naor [CFN94] to 
help digital content producers identify pirates as they illegally redistribute content. Traitor-tracing 
schemes are conceptually analogous to the example reidentification scheme we use to obtain our 
hardness result for private learning. Instantiating this connection with the traitor-tracing scheme 
of Boneh, Sahai, and Waters [BSW06], which relies on certain assumptions in bilinear groups, 
Dwork et al. [DNR + 09] exhibited a family of queries for which no efficient algorithm can 

produce a data structure which could be used to answer all queries in this family. Very recently, 
Boneh and Zhandry [BZ14] constructed a new traitor-tracing scheme based on indistinguishability 
obfuscation that yields the same infeasibility result for a family of n ■ 2°^ queries on records of 
size d. Extending this connection, Ullman [U1113] constructed a specialized traitor-tracing scheme 
to show that no efficient private algorithm can answer more than 0(n 2 ) arbitrary queries that are 
given as input to the algorithm. 

Dwork et al. [DNR + 09] also showed strong lower bounds against private algorithms for pro¬ 
ducing synthetic data. Synthetic data generation algorithms produce a new “fake” dataset, whose 
rows are of the same type as those in the original dataset, with the promise that the answers to 
some restricted set of queries on the synthetic dataset well-approximate the answers on the original 
dataset. Assuming the existence of one-way functions, Dwork et al. [DNR + 09] exhibited an effi¬ 
ciently computable collection of queries for which no efficient private algorithm can produce useful 
synthetic data. Ullman and Vadhan [UV11] refined this result to hold even for extremely simple 
classes of queries. 

Nevertheless, the restriction to synthetic data is significant to these results, and they do not rule 
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out the possibility that other privacy-preserving data structures can be used to answer large families 
of restricted queries. In fact, when the synthetic data restriction is lifted, there are algorithms (e.g. 
[HRS12, TUV12, CTUW14, DNT14]) that answer queries from certain exponentially large families 
in subexponential time. One can view the problem of synthetic data generation as analogous to 
proper learning. In both cases, placing natural syntactic restrictions on the output of an algorithm 
may in fact come at the expense of utility or computational efficiency. 

Efficiency of SQ Learning. Feldman and Kanade [FK12] addressed the question of whether 
information-theoretically efficient SQ learners - i.e., those making polynomially many queries - 
could be made computationally efficient. One of their main negative results showed that unless 
NP = RP, there exists a concept class with polynomial query complexity that is not efficiently 
SQ learnable. Moreover, this concept class is efficiently PAC learnable, which suggests that the 
restriction to SQ learning can introduce an inherent computational cost. 

We show that the concept class EncThresh can be learned (inefficiently) with polynomially 
many statistical queries. The result of Blum et al. [BDMN05] discussed above, showing that 
SQ learning algorithms can be efficiently simulated by differentially private algorithms, thus shows 
that EncThresh also separates SQ learners making polynomially many queries from computationally 
efficient SQ learners. 

Corollary 1.2 (Informal). Assuming the existence of strongly correct ORE, the concept class 
EncThresh is efficiently PAC learnable and has polynomial SQ query complexity, but is not effi¬ 
ciently SQ learnable. 

While our proof relies on much stronger hardness assumptions, it reveals ORE as a new barrier 
to efficient SQ learning. As discussed in more detail in Section 3.3, even though their result is 
about computational hardness, Feldman and Kanade’s choice of a concept class relies crucially 
on the fact that parities are hard to learn in the SQ model even information-theoretically. By 
contrast, our concept class EncThresh is computationally hard to SQ learn for a reason that appears 
fundamentally different than the information-theoretic hardness of SQ learning parities. 

Learning from Encrypted Data. Several works have developed schemes for training, testing, 
and classifying machine learning models over encrypted data (e.g. [GLN13, BPTG14]). In a model 
use case, a client holds a sensitive dataset, and uploads an encrypted version of the dataset to 
a cloud computing service. The cloud service then trains a model over the encrypted data and 
produces an encrypted classifier it can send back to the client, ideally without learning anything 
about the examples it received. The notion of privacy afforded to the individuals in the dataset here 
is complementary to differential privacy. While the cloud service does not learn anything about the 
individuals in the dataset, its output might still depend heavily on the data of certain individuals. 

In fact, our non-differentially private PAC learner for the class EncThresh exactly performs the 
task of learning over encrypted data, producing a classifier without learning anything about its 
examples beyond their order (this addresses the difficulty of implementing comparisons from prior 
work [GLN13]). Thus one can interpret our results as showing that not only are these two notions 
of privacy for machine learning training complementary, but that they may actually be in conflict. 
Moreover, the strong correctness guarantee we provide for ORE (which applies more generally to 
multi-input functional encryption) may help enable the theoretical study of learning from encrypted 
data in other PAC-style settings. 
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2 Preliminaries and Definitions 


2.1 PAC Learning and Private PAC Learning 

For each k £ N, let A/, be an instance space (such as {0, l} fc ), where the parameter k represents 
the size of the elements in X^. Let Ck be a set of boolean functions {c : X\ —> {0,1}}. The 
sequence (Ai,Ci), (^ 2 ,(^ 2 ),... represents an infinite sequence of learning problems defined over 
instance spaces of increasing dimension. We will generally suppress the parameter k , and refer to 
the problem of learning C as the problem of learning Ck for every k. 

A learner L is given examples sampled from an unknown probability distribution V over X , 
where the examples are labeled according to an unknown target concept c £ C. The learner must 
select a hypothesis h from a hypothesis class PL that approximates the target concept with respect 
to the distribution V. More precisely, 

Definition 2.1. The generalization error of a hypothesis h : X —> {0,1} (with respect to a target 
concept c and distribution V) is defined by errors (c, h) = Pr X ~v[h(x) 7 ^ c(x)\. If errors(c, h) < a 
we say that h is an a-good hypothesis for c on V. 

Definition 2.2 (PAC Learning [Val84]). Algorithm L : (X x {0,l}) n —> PL is an (a, (5)-accurate 
PA C learner for the concept class C using hypothesis class PL with sample complexity n if for all tar¬ 
get concepts c € C and all distributions V on X, given an input of n samples S = (( Xi , c(x’i)),..., (x n , c{x n ))), 
where each Xi is drawn i.i.d. from T> , algorithm L outputs a hypothesis h £ PL satisfying Pr[errorx>(c, h ) < 
ck ] > 1 — /3. The probability here is taken over the random choice of the examples in S and the coin 
tosses of the learner L. 

The learner L is efficient if it runs in time polynomial in the size parameter k, the representation 
size of the target concept c, and the accuracy parameters 1/a and 1 //3. Note that a necessary (but 
not sufficient) condition for L to be efficient is that its sample complexity n is polynomial in the 
learning parameters. 

If PL C C then L is called a proper learner. Otherwise, it is called an improper learner. 

Kasiviswanathan et al. [KLN + 11] defined a private learner as a PAC learner that is also 
differentially private. Recall the definition of differential privacy: 

Definition 2.3. A learner L : (X x {0, l}) n —> PL is (e, 6)-differentially private if for all sets T C PL, 
and neighboring sets of examples S ~ S', 

Pr [L(S) £ T] < e £ Pr[L(S') £ T] + 5. 

The technical object that we will use to show our hardness results for differential privacy is what 
we call an example reidentification scheme. It is analogous to the hard-to-sanitize database distri¬ 
butions [DNR + 09, UV11] and re-identifiable database distributions [BUY14] used in prior works 
to prove hardness results for private query release, but is adapted to the setting of computational 
learning. In the first step, an algorithm Gen ex chooses a concept and a sample S labeled according 
to that concept. In the second step, a learner L receives either the sample S or the sample S-i 
where an appropriately chosen example i is replaced by a junk example, and learns a hypothesis h. 

Finally, an algorithm Trace ex attempts to use h to identify one of the rows given to L. If Trace ex 
succeeds at identifying such a row with high probability, then it must be able to distinguish L(S) 
from L(S-i), showing that L cannot be differentially private. We formalize these ideas below. 
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Definition 2.4. An (a, £)- example reidentification scheme for a concept class C consists of a pair 
of algorithms, (Gen ex , Trace ex ) with the following properties. 

Gen ex (fc, n) Samples a concept c £ Cj and an associated distribution V. Draws i.i.d. examples 
xi,...,x n <— R T>. and a fixed value xq. Let S denote the labeled sample ((xi, c(xi)), ..., (x n , c(x ; 
and for any index i E [n], let S-i denote the sample with the pair (xj,c(xj)) replaced with 
(x 0 ,c(x 0 )). 

Trace ex (h) Takes state shared with Gen ex as well as a hypothesis h and identifies an index in [n] 
(or _L if none is found). 

The scheme obeys the following “completeness” and “soundness” criteria on the ability of Trace ex 
to identify an example given to a learner L. 


Completeness. A good hypothesis can be traced to some example. That is, for every efficient 
learner L, 

Pr[errorx)(c, h) < a A Trace ex (h) = _L] < f. 

Here, the probability is taken over h i — R L(S ) and the coins of Gen ex and Trace ex . 


Soundness. For every efficient learner L, Trace ex cannot trace i from the sample S-i. That is, 
for all i E [n], 

Pr[Trace ex (h) = i] < £ 

for h <- R L(S-i). 

We may sometimes relax the completeness condition to hold only under certain restrictions on 
L’s output (e.g. L is a proper learner or L is a representation learner). In this case, we say the 
(Gen ex , Trace ex ) is an example reidentification scheme for (properly, representation) learning a class 
C. 


Theorem 2.5. Let (Gen ex , Trace ex ) be an (a, f)-example reidentification scheme for a concept class 
C. Then for every ft > 0 and polynomial n(k), there is no efficient (e, <5) -differentially private 
(a, (3)-PAC learner for C using n samples when 


5 < 




n 


e £ f- 


In a typical setting of parameters, we will take a,/3,e = 0(1) and <5, £ = o(l/n), in which case 
the inequality in Theorem 2.5 will be satisfied for sufficiently large n. 

Proof. Suppose instead that there were a computationally efficient (e, ^-differentially private (a, /3)- 
PAC learner L for C using n samples. Then there exists an i E [n] such that Pr[Trace ex (L( l S')) = 
i] > (1 — fi — f)/n. However, since L is differentially private, 


Pr[Trace ex (L(5_j)) 


i\>e- £ (—JLA-S) > £(n), 
n 


which contradicts the soundness of (Gen ex , Trace ex ). 


□ 




2.2 Order-Revealing Encryption 

Definition 2.6. An Order-Revealing Encryption (ORE) scheme is a tuple (Gen, Enc, Dec, Comp) 
of algorithms where: 


• Gen(l'\ ]/) is a randomized procedure that takes as inputs a security parameter A and plain¬ 
text length £, and outputs a secret encryption/decryption key sk and public parameters 

params. 

• Enc(sk,m) is a potentially randomized procedure that takes as input a secret key sk and a 
message m G {0,1} £ , and outputs a ciphertext c. 

• Dec(sk, c) is a deterministic procedure that takes as input a secret key sk and a ciphertext c, 
and outputs a plaintext message m G {0,1 } £ or a special symbol _L. 

• Comp(params, co, ci) is a deterministic procedure that “compares” two ciphertexts, outputting 

either “>”, “=”, or _L. 


Correctness. An ORE scheme must satisfy two separate correctness requirements: 

• Correct Decryption: This is the standard notion of correctness for an encryption scheme, 
which says that decryption succeeds. We will only consider strongly correct decryption, which 
requires that decryption always succeeds. For all security parameters A and message lengths 

t, 

Pr[Dec(sk, Enc(sk, m) ) = rn : (sk, params) Gen(l A , l/)] = 1. 

• Correct Comparison: We require that the comparison function succeeds. We will consider 
two notions, namely strong and weak. In order to define these notions, we first define two 
auxiliary functions: 

— Comp plain (mo,mi) is just the plaintext comparison function. That is, for mo < mi, 

C° m P p Zain( m o,mi) = “ < ”, Comp piain (mi,m 0 ) = “ > ”, and Comp pZain (m 0 ,m 0 ) = “ = 
)) 

— Comp cipft (sk, co, ci) is a ciphertext comparison function which uses the secret key. If first 
computes mb = Dec(sk, q,) for b = 0,1. If either mo = 1 or mi = T (in other words, 
if either decryption failed), then Q.omp ciph outputs _L. If both mo, mi / _L, then the 
output is Comp plain (m 0 ,mi). 

Now we define our comparison correctness notions: 


— Weakly Correct Comparison: This informally requires that comparison is consis¬ 
tent with encryption. For all security parameters A, message lengths i, and messages 
m 0 ,mi G {0,1 Y, 


Pr 


Comp(params,c 0 ,ci) = Comp plain (m 0 , mi) : 


(sk, params) Gen(l A , l e ) 
Cb G- Enc(sk, nib) 


= 1. 


In particular, for correctly generated ciphertexts, Comp never outputs _L. 
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— Strongly Correct Comparison: This informally requires that comparison is consis¬ 
tent with decryption. For all security parameters A, message lengths £, and ciphertexts 
co, Cl, 


Pr 


Comp(params, cq, ci) = Comp ciph (sk, cq, ci) : (sk, params) Gen(l A ,l/) 


= 1. 


Security. For security, we will consider a relaxation of the “best possible” security notion of 
Boneh et al. [BLR + 15]. Namely, we only consider static adversaries that submit all queries at once. 
“Best possible” security is a modification of the standard notion of CPA security for symmetric 
key encryption to block trivial attacks. That is, since the comparison function always leaks the 
order of the plaintexts, the left and right sets of challenge messages must have the same order. In 
our relaxation where all challenge messages are queried at once, we can therefore assume without 
loss of generality that the left and right sequences of messages are sorted in ascending order. For 
simplicity, we will also disallow the adversary from querying on the same message more than once. 
This gives the following definition: 


Definition 2.7. An ORE scheme (Gen, Enc, Dec, Comp) is statically secure if, for all efficient ad¬ 
versaries A , | Pr[VFo] — Pr[Wi]| is negligible, where IP), is the event that A outputs 1 in the following 
experiment: 

A , . (L) (L) (L) , ( R ) (R) ^ (R) 

• A produces two message sequences m\ < m 2 < ■ ■ ■ < m q and rn\ < m 2 < • • • < m q 


• The challenger runs (sk, params) •(— Gen(l A , ]/). It then responds to A with params, as well 


as ci,..., c q where 


Enc(sk, raf') 
Enc(sk, m[ R ^) 


if 6 = 0 
if b= 1 


• A outputs a guess b' for b. 

We also consider a weaker definition, which only allows the sequences m [ L ^ and rn[ ^ to differ 
at a single point: 

Definition 2.8. An ORE scheme (Gen, Enc, Dec, Comp) is statically single-challenge secure if, for 
all efficient adversaries A , | Pr[IFo] — Pr[Wi]| is negligible, where Wb is the event that A outputs 1 
in the following experiment: 

• A produces a sequence of messages m± < m 2 < ■ ■ ■ < m q , and challenge messages mL,mR 
such that mi < mi < mR < mqi for some i E [q — 1]. 

• The challenger runs (sk, params) •(— Gen(l A , 1 ^). It then responds to A with params, as well 
as ci,..., c q where c- L = Enc(sk, rrij) and 


c 


* 


Enc(sk, if b = 0 

Enc(sk, mR) if b = 1 


• A outputs a guess b' for b. 

We now argue that these two definitions are equivalent up to some polynomial loss in security. 
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Theorem 2.9. (Gen, Enc, Dec, Comp) is statically secure if and only if it is statically single-challenge 
secure. 

Proof. We prove that single-challenge security implies many-challenge security through a sequence 
of hybrids. Each hybrid will only differ in the messages rn t that are encrypted, and each adjacent 
hybrid will only differ in a single message. The first hybrid will encrypt rn-^. and the last hybrid 
will encrypt m\ . Thus, by applying the single-challenge security for each hybrid, we conclude 
that the first and last hybrids are indistinguishable, thus showing many-challenge security. 

Hybrid j for j < Q- 

{ min(m,- L \ m\ R ^) if i < j 

(L) .... 

m\ it i > j 

First, notice that all the rn t are in order since both sequences m^ and m^ are in order. Second, 
the only difference between Hybrid (j — 1) and Hybrid j is that rrij = rn^' in Hybrid (j — 1) 
and rrij = minin Hybrid j. Thus, single-challenge security implies that each adjacent 

hybrid is indistinguishable. Moreover, for j where < rri R ' 1 , the two hybrids are actually 

identical. 


Hybrid j for j > q. 

f • f (L) (i?h f ^ r, 

mm [rn- , m- ) if * < 2q — j 
mi = \mM if i > 2q — j 

Again, notice that all the m* are in order. Moreover, the only different between Hybrid (2q — j) and 
Hybrid (2 q — j + 1 ) is that rn 3 = minin Hybrid (2 q — j) and nij = rri^ in Hybrid 
(2 q — j + 1). Thus, single-challenge security implies that each adjacent hybrid is indistinguishable. 
Moreover, for j where rri^' 1 > , the two hybrids are actually identical. 

□ 


3 The Concept Class EncThresh and its Learnability 

Let (Gen, Enc, Dec, Comp) be a statically secure ORE scheme with strongly correct comparison. We 
define a concept class EncThresh, which intuitively captures the class of threshold functions where 
examples are encrypted under the ORE scheme. Throughout this discussion, we will take N = 2 e 
and regard the plaintext space of the ORE scheme to be [ N} = {1,..., N}. Ideally we would like, 
for each threshold t 6 [N + 1] and each (sk, params) Gen(l A ), to define a concept 


/i,sk,params(c) 


1 if Dec s k(c) < t 
0 otherwise. 


However, we need to make a few technical modifications to ensure that EncThresh is efficiently PAC 
learnable. 
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1. In order for the learner to be able to use the comparison function Comp, it must be given the 
public parameters params generated by the ORE scheme. We address this in the natural way 
by attaching a set of public parameters to each example. Moreover, we define EncThresh so 
that each concept is supported on the single set of public parameters that corresponds to the 
secret key used for encryption and decryption. 

2. Only a subset of binary strings form valid (sk, params) pairs that are actually produced by Gen 
in the ORE scheme. To represent concepts, we need a reasonable encoding scheme for these 
valid pairs. The encoding scheme we choose is the polynomial-length sequence of random 
coin tosses used by the algorithm Gen to produce (sk, params). 

We now formally describe the concept class EncThresh. Each concept is parameterized by a 
string r, representing the coin tosses of the algorithm Gen, and a threshold t £ [N + 1] for N = 2 . 
In what follows, let (sk 7 , params 7 ") be the keys output by Gen(l A , fi) when run on the sequence of 
coin tosses r. Let 


/t,r(params, c) 


1 if (params = params 7 ") A (Dec(sk r ,c) _L) A (Dec(sk r ,c) < t ) 
0 otherwise. 


Notice that given t and r, the concept can be efficiently evaluated. The description length k of 
the instance space = {0, l} fc is polynomial in the security parameter A and plaintext length £. 


3.1 An Efficient PAC Learner for EncThresh 

We argue that EncThresh is efficiently PAC learnable by formalizing the argument from the intro¬ 
duction. Because we need to include the ORE public parameters in each example, the PAC learner 
L (Algorithm 3) for EncThresh actually works in two stages. In the first stage, L determines whether 
there is significant probability mass on examples corresponding to some public parameters params. 
Recall that each concept in EncThresh is supported on exactly one such set of parameters. If there 
is no significant mass on any params, then the all-zeroes hypothesis is a good hypothesis. On the 
other hand, if there is a heavy set of parameters, the learner L applies Comp using those parameters 
to learn a good comparator. 

Theorem 3.1. Let a, f3 > 0. There exists a PAC learning algorithm L for the concept class 
EncThresh achieving error a and confidence 1 — (3. Moreover, L is efficient (running in time 
polynomial in the parameters k, 1/ct, log(l//3)j. 


Proof. Fix a target concept r e EncThresh^ and a distribution T> on examples. First observe 
that the learner L always outputs a hypothesis with one-sided error, i.e. we always have h < ft )T 
pointwise. Also observe that ft/ r < ft, r pointwise for any t' < t. These both follow from the strong 
correctness of the ORE scheme. Let (sk 7 ", params 7 ") denote the keys output by Gen(l A ,l^) when 
run on the sequence of coin tosses r. Let POS denote the set of examples (params, c) on which 
ft t r( params, c) = 1. We divide the analysis of the learner in to two cases based on the weight V 
places on POS. 
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Algorithm 1 Learner L for EncThresh 

1. Request examples {(paramsx, ci, 61 ), (params n , c n , b n )} for n = f log(l /f3)/ot\. 

2. Identify an i for which 6* = 1 and set params* = params^; if no such i exists, return h = 0. 

3. Let G = {j : params ; = params*, bj = 1}. Let j* £ G be an index with 
Comp(params*, Cj, Cj*) £ {<, =, _L} for all j £ G. 

4. Return h defined by 

, . . | 1 if (params = params*) A (Compfparams*, c, c,*) £ {<, =}) 

L(params, c) = < 

0 otherwise. 


Case 1: V places weight at least a on POS. Define t £ [N + 1] as the largest t < t such that 
errorp(/{ r , ft.r) > ol. Such a t is guaranteed to exist since /o >r is the all-zeros function, and therefore 
errorx>(fo,r, ft,r ) is equal to the weight V places on POS, which is at least a. 

Suppose ff +1 r < h pointwise. Since h has one-sided error (that is, h < ft. r pointwise), we have 
errorp(/ f+l r , f Ur ) = errorz>(/t +ljr , h) + error v {h, f t , r ), or 

error v {h, f t , r ) = error v{f{ +hr , ft,r ) - errorp(/ {+l r , h) < error v(fi +1 , r , ft,r ) < «• 

Therefore, it suffices to show that ff +1 r < h with probability at least \ — (3. This is guar¬ 
anteed as long as L receives a sample (params r , Cj, 1) with t < Dec(sk r ,Cj) < t. In other words, 
/t ir (params r , a) = 1 and /£ r (params r , a) = 0. Since ff r < ft. r pointwise, such samples exactly 
account for the error between ff r and ft, r ■ Thus since error v(ft r > ft,r) > ct, for each i it must 
be that t < Dec(sk r ,Cj) < t with probability at least a. The learner L therefore receives some 
sample Q with t < Dec(sk r , c t ) < t with probability at least 1 — (1 — a) n >1-/3 (since we took 
n > log(l /P)/a). 

Case 2: V places less than a weight on POS. Then the identically zero hypothesis has error at 
most a, so the claim holds because 0 < h < ft, r - 

□ 


3.2 Hardness of Privately Learning EncThresh 

We now prove the hardness of privately learning EncThresh by constructing an example reidenti¬ 
fication scheme for this concept class. Recall that an example reidentification scheme consists of 
two algorithms, Gen ex , which selects a distribution, a concept, and examples to give to a learner, 
and Trace ex which attempts to identify one of the examples the learner received. 

Our example reidentification scheme yields a hard distribution even for weak-learning, where 
the error parameter a is taken to be inverse-polynomially close to 1 / 2 . 

Theorem 3.2. Let y(n) and £(n) be noticeable functions. Let (Gen, Enc, Dec, Comp) be a stati¬ 
cally single-challenge secure ORE scheme. Then there exists an (efficient) (a = \ — 7 , £)-example 
reidentification scheme (Gen ex ,Trace ex ) for the concept class EncThresh. 
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We start with an informal description of the scheme (Gen ex , Trace ex ). The algorithm Gen ex sets 
up the parameters of the ORE scheme, chooses the “middle” threshold concept corresponding to 
t = N/2, and sets the distribution on examples to be encryptions of uniformly random messages 
(together with the correct public parameters needed for comparison). Let m\ < m 2 < ■ ■ ■ < m n 
denote the sorted sequence of messages whose encryptions make up the sample produced by Gen ex 
(with overwhelming probability, they are indeed distinct). We can thus break the plaintext space up 
into buckets of the form Bi = [mi, rn l+ \ ). Suppose L is a (weak) learner that produces a hypothesis h 
with advantage 7 over random guessing. Such a hypothesis h must be able to distinguish encryptions 
of messages m < t from encryptions of messages m > t with advantage 7 . Thus, there must be a 
pair of adjacent buckets Bi-\,Bi for which h can distinguish encryptions of messages from 
from encryptions from Bi with advantage '/. 

This observation leads to a natural definition for Trace ex : locate a pair of adjacent buckets 
Bi-i,Bi that h distinguishes, and output the identity i of the example separating those buckets. 
Completeness of the resulting scheme, i.e. the fact that some example is reidentified when L 
succeeds, follows immediately from the preceding discussion. We argue soundness, i.e. that an 
example absent from L’s sample is not identified, by reducing to the static security of the ORE 
scheme. The intuition is that if L is not given example i, then it should not be able to distinguish 
encryptions from bucket from encryptions from bucket Bi. 

To make the security reduction somewhat more precise, suppose for the sake of contradiction 
that there is an efficient algorithm L that violates the soundness of (Gen ex , Trace ex ) with noticeable 
probability £. That is, there is some i such that even without example i, the algorithm L manages to 
produce (with probability £) a hypothesis h that distinguishes from Bi. A natural first attempt 
to violate the security of the ORE is to construct an adversary that challenges on the message 
sequences m\ < ■ ■ ■ < mj_i < < m-j+i, <, m n and rri\ < • • • < mi -1 < < rrii + \ < ■ ■ ■ < 

m n , where m^ is randomly chosen from Bi -1 and m[ R ' is randomly chosen from Bi. Then if h 
can distinguish from Bi, the adversary can distinguish the two sequences. Unfortunately, this 
approach fails for a somewhat subtle reason. The hypothesis h is only guaranteed to distinguish 
Bi -1 from Bi with probability £. If h fails to distinguish the buckets - or distinguishes them in the 
opposite direction - then the adversary’s advantage is lost. 

To overcome this issue, we instead rely on the security of the ORE for sequences that differ on 
two messages. For the “left” challenge, our adversary samples two messages from the same randomly 
chosen bucket, Bi -1 or Bi (in addition to requesting encryptions of m \,..., m,_i, mi,..., rn n ). For 
the “right” challenge, it samples one message from each bucket Bi -1 and Bi. Let c° and c 1 be the 
ciphertexts corresponding to thee challenge messages. If h agrees on c° and c 1 , then this suggests 
the messages are from the same bucket, and the adversary should guess “left”. On the other hand, 
if h disagrees on c° and c 1 , then the adversary should guess “right”. If h distinguishes the buckets 
Bi -1 and Bi, this adversary does strictly better than random guessing. On the other hand, even 
if h fails to distinguish the buckets, the adversary does at least as well as random guessing. So 
overall, it still has a noticeable advantage at the ORE security game. 

We now give the formal proof of Theorem 3.2. 

Proof. We construct an example reidentification scheme for EncThresh as follows. The algorithm 
Gen ex fixes the threshold t = N/2 and samples (sk 1 , params”) «— a Gen(l A ,l^), yielding a concept 
fi r . Let V be the distribution of (params”, Enc(sk r , m)) for uniformly random m E [N]. Let 
rn\,... ,m! n •(— R [N], and let m\ < ■ ■ ■ < m n be the result of sorting the mf Let mo = 0 and 
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m n +1 = N. Since n = poly(fc) <C N, these random messages will be well-spaced. In particular, with 
overwhelming probability, |mj+i — rri t \ > 1 for every i, so we assume this is the case in what follows. 
Gen ex then sets the samples to be (x'i = (params r , Enc(sk r , m^)),..., x n = (params r , Enc(sk r , m! n ))). 
Let xo = (params r , Enc(sk r , mo)) be a “junk” example. 

The algorithm Trace ex creates buckets Bi = [m*, rrii+i). For each i, let 

Pi = Pr [/i(params r , Enc(sk, m)) = 1]. 

m£Bi ,coins of Enc 

By sampling random choices of m in each bucket, Trace ex can efficiently compute a good estimate 
pi « pi for each i (Lemma 3.3). It then accuses the least i for which pi-± — pi > and _!_ if none 
is found. 

Lemma 3.3. Let K = ^ log(9n/£). For each i = 0,... ,n, let 

1 K 

Pi = % h ( x i) 

3 =1 

where Xj = (params r , Enc(sk r ,mj)) for i.i.d. mi,..., rriK Bi. Then \pi — pp < fL j or ever y { 
with probability at least 1 — C/4. 

Proof. By a Chernoff bound, the probability that any given pi deviates from pi by more than is 
at most 2 exp(—iv 7 2 / 8 n 2 ) < j^piy- The lemma follows by a union bound. □ 

We first verify completeness for this scheme. Let L be a learner for EncThresh using n examples. 
If the hypothesis h produced by L is — 7 )-good, then there exists io < i\ such that pi 0 —pi x > 27 . 
If this is the case, then there must be an i for which pi-\ — pi > . Then with probability all but 

f(n)/ 2 over the estimates pi , we have pi-\ — pi > so some index is accused. 

Now we verify soundness. Fix a PPT L, and let j* £ [n]. Suppose L violates the soundness of 
the scheme with respect to j* , i.e. 


Pr [Trace ex (/i) = j*] > f. 

h<r-RL(S_j *),coins of Gen ex 

We will use L to construct an adversary A for the ORE scheme that succeeds with noticeable 
advantage. It suffices to build an adversary for the static (many-challenge) security of ORE, 
with Theorem 2.9 showing how to convert it to a single-challenge adversary. This many-challenge 
adversary is presented as Algorithm 2. (While not explicitly stated, the adversary should halt and 
output a random guess whenever the messages it samples are not well-spaced.) 

Let i* be such that mi* = m'„. With probability at least £ over the parameters (sk r , params”), 
the choice of messages, the choice of the hypothesis h, and the coins of Trace ex , there is a gap 
pi *-1 — pi* > T Hence, by Lemma 3.3, there is a gap pi*-i — pi* > ^ with probability at least |. 

We now calculate the advantage of the adversary A. Fix a hypothesis h. For notational 
simplicity, let p = pi*-\ and let q = pi*. Let yo = /r(params r , c°«) and y\ = /r(params r , c)«). Then 
the adversary’s success probability is: 
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Algorithm 2 ORE adversary A 

1. Sample m^,..., m' n i — R [N], and let mi < • • • < m n be the result of sorting the m'-. Let 7r 

be the permutation on {1,... ,n} such that = m'-. Let mo = 0. Let i* = tt( j*) so that 

= m'j *. 

2. Construct pairs {mP L ,m\) and (m° R ,m R ) as follows. Let Bq = (mj*_i,mj*) and Ri = 

(m**, Sample m R < m^ at random from the same Bj, for a random choice of 

j 6 {0,1}. Sample m° R 4— R Bq and rn R R B\. 

3. Challenge on the pair of sequences mo, mi,..., mj*_i, m\, m ^,..., m n and 

mo,m\,... ,m.i*-i,m R ,m 2 R ,mi*,... ,m n , receiving ciphertexts c\,... ... ,c n . For 

j A 3*t let c'j = so that c'j is an encryption of m'-. 

4. Set t = N/2 and let 

S~j* = {(params r ,di,x("i , i < *))> ■ • • > (params r , c'*_ l5 < *))> 

(params r , c 0 ,1), (params r , c'* +1 , x(m'* +1 < *)), • • •, (params r , c' n , X {m' n < t))} 

= {(params r ,c n .(i),x(m^.(i) < t )),..., (params r , x(m*-(j._i) < *)), 

(params r , c 0 ,1), (params r , c^+i), xtm^-.+i) < t)), (params r , c^ n ),x(m n{n) < t))} 

Obtain h R L(S-j*). 

5. Guess b' = 0 if /i(params r , c°*) = /i(params r , cf*). Otherwise guess b' = 1. 


Pr[fe' = b\ = * (Pr[y 0 = yi\b = 0] + Pr[y 0 A Vi\b = 1]) 

= + (! -P? + Q 2 + (! “ Q) 2 ) + ( 1 ~PQ- (1 -P)(l - ?))) 

1 1 , , 2 
= 2 + 2 (p "^‘ 


2 

Thus if p — q > then the adversary’s advantage is at least On the other hand, even for 
arbitrary values of p, q, the advantage is still nonnegative. Therefore, the advantage of the strategy 
is at least — negl(/c) (the negl(/c) term coming from the assumption that the m( sampled where 
distinct), which is a noticeable function of the parameter k. This contradicts the static security of 
the ORE scheme. 

□ 


3.3 The SQ Learnability of EncThresh 

The statistical query (SQ) model is a natural restriction of the PAC model by which a learner 
is able to measure statistical properties of its examples, but cannot see the individual examples 
themselves. We recall the definition of an SQ learner. 
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Definition 3.4 (SQ learning [Kea98]). Let c : X — > {0,1} be a target concept and let V be 
a distribution over X. In the SQ model, a learner is given access to a statistical query oracle 
STAT(c, T>). It may make queries to this oracle of the form (ip, r), where ip : X x {0,1} —> {0,1} is 
a query function and r E (0,1) is an error tolerance. The oracle STAT(c, T>) responds with a value 
v such that |u — Pr x£ j)[ip(x, c(x)) = 1]| < r. The goal of a learner is to produce, with probability 
at least 1 — (3, a hypothesis h : X —> {0,1} such that errorp(c, h ) < a. The query functions must 
be efficiently evaluable, and the tolerance t must be lower bounded by an inverse polynomial in k 
and 1/a. 

The query complexity of a learner is the worst-case number of queries it issues to the statistical 
query oracle. An SQ learner is efficient if it also runs in time polynomial in k, 1/a, 1//3. 

Feldman and Kanade [FK12] investigated the relationship between query complexity and com¬ 
putational complexity for SQ learners. They exhibited a concept class C which is efficiently PAC 
learnable and SQ learnable with polynomially many queries, but assuming NP ^ RP, is not 
efficiently SQ learnable. Concepts in this concept class take the form 


g<P,y( x i x ') 


PARIS') if x = cp 
0 otherwise. 


Here, PAR y (x') is the inner product of y and x' modulo 2. The concept class C consists of 
where <p is a satisfiable 3-CNF formula and y is the lexicographically first satisfying assignment 
to <p. The efficient PAC learner for parities based on Gaussian elimination shows that C is also 
efficiently PAC learnable. It is also (inefficiently) SQ learnable with polynomially many queries: 
either the all-zeroes hypothesis is good, or an SQ learner can recover the formula <p bit-by-bit and 
determine the satisfying assignment y by brute force. On the other hand, because parities are 
information-theoretically hard to SQ learn, the satisfying assignment y remains hidden to an SQ 
learner unless it is able to solve 3-SAT. 

In this section, we show that the concept class EncThresh shares these properties with C. Namely, 
we know that EncThresh is efficiently PAC learnable and because it is not efficiently privately 
learnable, it is not efficiently SQ learnable [BDMN05]. We can also show that EncThresh has an SQ 
learner with polynomial query complexity. Making this observation about EncThresh is of interest 
because the hardness of SQ learning EncThresh does not seem to be related to the (information- 
theoretic) hardness of SQ learning parities. 

Proposition 3.5. The concept class EncThresh is (inefficiently) SQ learnable with polynomially 
many queries. 

As with C there are two cases. In the first case, the target distribution places nearly zero weight 
on examples with params = params r , and so the all-zeroes hypothesis is good. In the second case, the 
target distribution places noticeable weight on these examples, and our learner can use statistical 
queries to recover the comparison parameters params r bit-by-bit. Once the public parameters are 
recovered, our learner can determine a corresponding secret key by brute force. Lemma 3.6 below 
shows that any corresponding secret key - even one that is not actually sk' - suffices. The learner 
can then use binary search to determine the threshold value t. 

Proof. Let ft. r be the target concept, V be the target distribution, and a be the target error rate. 
With the statistical query (x x b H > b, ct/4), we can determine whether the all-zeroes hypothesis is 
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accurate. That is, if we receive a value that is less than a/2, then Pr X £v[ft,r(x) = 1] < a. If not, 
then we know that Pr X ^v[ft,r(x) = 1] > a/4, so V places significant weight on examples prefixed 
with pa rams' - . Suppose now that we are in the latter case. 

Let m = |params|. For i = 1, ..., m, define ^(params ,c,b) = 1 if params,; = 1 and 6 = 1, and 
^(params, c, 6) = 0 otherwise. Then by asking the queries (ipi,a/ 16), we can determine each bit 
params 7- of params 7- . 

Now by brute force search, we determine a secret key sk for which (sk, params 7- ) G Range(Gen). 
The recovered secret key sk may not necessarily be the same as sk 7- . However, the following lemma 
shows that sk and sk 7 are functionally equivalent: 

Lemma 3.6. Suppose (Gen, Enc, Dec, Comp) is a strongly correct ORE scheme. Then for any pair 
(ski, params), (sk 2 , params) G Range(Gen), we have that Dec s kj(c) = Dec s | <2 (c) for all ciphertexts c. 

With the secret key sk in hand, we now conduct a binary search for the threshold t. Recall that 
we have an estimate v for the weight that ft, r places on positive examples, i.e. \v — Pr x eT>[ft,r{%) = 
1]| < a/4. Starting at t\ = N/2, we issue the query (<^i,a/4) where <£>i(params, c, b) = 1 iff 
params = params 7- and Dec(sk, c) < t. Let h^ denote the hypothesis 

. . [l if (params = params 7- ) A (Dec(sk, c) / _L) A (Dec(sk, c) < H) 

h tl (params, c) = < 

I U otherwise. 

Thus, the query (ipi, a/4) approximates the weight /i 4l places on positive examples. Let the answer 
to this query be v\. If |ui — w| < a/2, then we can halt and output the good hypothesis h tl . 
Otherwise, if v\ < v — a/2, we set the next threshold to £2 = 31V/4, and if v\ > v + a/2, we set 
the next threshold to £2 = N/ 4. We recurse up to log N = l = poly(fc) times, yielding a good 
hypothesis for ff r . □ 

Proof of Lemma 3.6. Suppose the lemma is not true. First suppose that there exists a ciphertext 
c such that Dec(ski,c) = Pi < P 2 = Dec(sk 2 ,c). Let c! G Enc(ski,p 2 )- Then by strong correctness 
applied to the parameters (ski, params), we must have Comp(params, c, d) = “<”. Now by strong 
correctness applied to (sk 2 , params), we must have Dec(sk 2 ,c') > p 2 - Thus, p\ < Dec(ski,c') = 
P 2 < Dec(sk 2 ,c'). Repeating this argument, we obtain a contradiction because the message space 
is finite. 

Now suppose instead that there is a ciphertext c for which Dec(ski, c) = p G [N], but Dec(sk 2 , c) = 
_L. Let d G Enc(ski,p') for some p' > p. Then Comp(params, c, c') = “<” by strong correctness ap¬ 
plied to (params, ski). But Comp(params, c, d) = “T” by strong correctness applied to (params, sk 2 ), 
again yielding a contradiction. □ 

4 ORE with Strong Correctness 

We now explain how to obtain ORE with strongly correct comparison, as all prior ORE schemes 
only satisfy the weaker notion of correctness. The lack of strong correctness is easiest to see with 
the scheme of Boneh et al. [BLR + 15]. The protocol is built from current multilinear map construc¬ 
tions, which are noisy. If the noise terms grow too large, the correctness of the multilinear map 
is not guaranteed. The comparison function in [BLR + 15] is computed by performing multilinear 
operations, and for correctly generated ciphertexts, the operations will give the right answer. How¬ 
ever, there exist ciphertexts, namely those with very large noise, for which the comparison function 
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gives an incorrect output. The result is that the comparison operation is not guaranteed to be 
consistent with decrypting the ciphertexts and comparing the plaintexts. 

As described in the introduction, we give a generic conversion from any ORE scheme with weakly 
correct comparison into a strongly correct scheme. We simply modify the encryption algorithm by 
adding a non-interactive zero-knowledge (NIZK) proof that the resulting ciphertext is well-formed. 
Then the decryption and comparison procedures check the proof(s), and only output a non-T result 
(either decryption or comparison) if the proof(s) are valid. 

Instantiating our scheme. In our construction, we need the (weak) correctness of the underly¬ 
ing ORE scheme to hold with probability one. However, the existing protocols only have correctness 
with overwhelming probability, so some minor adjustments need to be made to the protocols. This 
is easiest to see in the ORE scheme of Boneh et al. [BLR + 15]. The Boneh et al. scheme uses 
noisy multilinear maps [GGH13a] which may introduce errors. Therefore, the protocol described 
in [BLR + 15] only achieves the (weak) correctness property with overwhelming probability, whereas 
we will require (weak) correctness with probability 1 for the conversion. However, it is straightfor¬ 
ward to generate the parameters for the protocol in such a way as to completely eliminate errors. 
Essentially, the parameters in the protocol have an error term that is generated by a (discrete) 
Gaussian distribution, which has unbounded support. Instead, we truncate the Gaussian, resulting 
in a noise distribution with bounded support. By truncating sufficiently far from the center, the 
resulting distribution is also statistically close to the full Gaussian, so security of the protocol with 
truncated noise follows from the security of the protocol with un-truncated noise. By truncating 
the noise distribution, it is straightforward to set parameters so that no errors can occur. 

It is similarly straightforward to modify current obfuscation candidates, which are also built 
from multilinear maps, to obtain perfect (weak) correctness by truncating the noise distributions. 
Thus, our scheme has instantiations using multilinear maps or iO. 

4.1 Conversion from Weakly Correct ORE 

We describe our generic conversion from an order-revaling encryption scheme with weak correctness 
using NIZKs. We will need the following additional tools: 

Perfectly binding commitments. A perfectly binding commitment Com is a randomized al¬ 
gorithm with two properties. The first is perfect binding, which states that if Com(m;r) = 
Com(m / ;r / ), then m = m!. The second requirement is computational hiding, which states that 
the distributions Com(m) and Com(m / ) are computationally indistinguishable for any messages 
m,m!. Such commitments can be built, say, from any injective one-way function. 

Perfectly sound NIZK. A NIZK protocol consists of three algorithms: 

• Setup(l A ) is a randomized algorithm that outputs a common reference string crs. 

• Prove(crs, x, w) takes as input a common reference string crs, an NP statement x, and a 
witness w, and produces a proof ir. 

• Ver(crs, x, ir) takes as input a common reference string crs, statement x, and a proof it, and 
outputs either accept or reject. 
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We make three requirements for a NIZK: 

• Perfect Completeness. For all security parameters A and any true statement x with witness 


w. 


Pr[Ver(crs, x, ir) = accept : crs Setup(l A ); n Prove(crs, x, re)] = 1. 


Perfect Soundness. For all security parameters A, any false statement x and any (invalid) 
proof 7 r, 

Pr[Ver(crs, x, n) = accept : crs Setup(l A )] = 0. 


• Computational Zero Knowledge. There exists a simulator 5i, S 2 such that for any com¬ 
putationally bounded adversary A, the quantity 


|| Pr[^ Prove ( crs ’'’')(crs) = 1 : crs <- Setup(l A )] - Pr[^ 5im ( crs ’ r --)(crs) = 1 : (crs, r) <- Si(l A )]|| 

is negligible, where Sim(crs,T,x,w) outputs 52(crs,r, x) if w is a valid witness for x, and 
Sim(ors, r, x, w) = _L if w is invalid. 

NIZKs satisfying these requirements can be built from bilinear maps [GOS12]. 


4.1.1 The Construction 

We now give our conversion. Let (Setup, Prove, Ver) be a perfectly sound NIZK and (Gen 7 , Enc 7 , Dec 7 , Comp 7 ) 
and ORE with weakly correct comparison. We will assume that Enc 7 is deterministic; if not, we can 
derandomize Enc 7 using a pseudorandom function. Let Com be a perfectly binding commitment. 

We construct a new ORE scheme (Gen, Enc, Dec, Comp) with strongly correct comparison: 

• Gen(l A ,l f ): run (sk 7 , params 7 ) 4 — Gen 7 (l A ,l^). Let a = Com(sk;r) for randomness r, and 
run crs Setup(l A ). Then the secret key is sk = (sk 7 , ?-, crs) and the public parameters are 
params = (params 7 , a, crs). 

• Enc(sk,m): Compute c 7 = Enc 7 (sk 7 ,m). Let x c > be the statement 3m, sk , f : <7 = Com(sk ,r) A 
d = Enc 7 (sk ,m). Run 7 r c / = Prove(crs, x c i, (m, sk 7 ,?’) ). Output the ciphertext c = (c 7 ,7r c /). 

• Dec(sk, c): Write c = (c 7 ,7t c /). If Ver(crs, x c /, 7 r c /) = reject, output _L. Otherwise, output 
m = Dec 7 (sk 7 , c 7 ). 

• Comp(params, co, ci); white c& = (d h . tt c / ) and params = (params 7 , a, crs). If Ver(crs, x c i , 7r c /) = 
reject for either b = 0,1, then output _L. Otherwise, output Comp 7 (params 7 , Cq, cj). 

Correctness. Notice that, for each plaintext m, the ciphertext component c 7 = Enc 7 (sk 7 ,m) is 
the unique value such that Dec(sk, (c 7 ,7r)) = m for some proof ir. Moreover, the completeness of the 
zero knowledge proof implies that Enc(sk, m) outputs a valid proof. Decryption correctness follows. 

For strong comparison correctness, consider two ciphertexts co,ci where Cf, = (c 7 b ,7r c /). Sup¬ 
pose both proofs 7r c ^ are valid, which means that verification passes when running Comp and so 
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Comp(params, co, ci) = Comp 7 (params 7 , Cg, df). Verification also passes when decrypting c&, and so 
Dec(sk, Cb) = Dec 7 (sk 7 , c- b ). 

Since the proofs are valid, c' b = Enc 7 (sk 7 , m&) for some rrn, for both 6 = 0,1. The weak correctness 
of comparison for (Gen 7 , Enc 7 , Dec 7 , Comp 7 ) implies that Comp 7 (params 7 , c 7 0 , df) = Comp pia j n (mo,mi). 
The decryption correctness of (Gen 7 , Enc 7 , Dec 7 , Comp 7 ) then implies that Dec(sk 7 ,c 7 b ) = mb, and 
therefore Dec(sk, c&) = mj. Thus Comp cipft (sk, co, ci) = Comp plain (mo,TOi). Putting it all together, 
Comp(params, co, ci) = Comp cipfe (sk, co, ci), as desired. 

Now suppose one of the proofs 7r c / are invalid. Then Comp(params, co, ci) = T and Dec(sk, Cb) = 
T. This means Comp cip/l (sk, co, ci) = _L = Comp(params, co, ci), as desired. 

Security. To prove security, we first use the zero-knowledge simulator to simulate the proofs tt' c 
without using a witness (namely, the secret decryption key). Then we use the hiding property 
of the commitment to replace a with a commitment to 0. At this point, the entire game can be 
simulated using an Enc 7 oracle, and so the security reduces to the security of Enc 7 . 

Theorem 4.1. If (Gen 7 , Enc 7 , Dec 7 , Comp 7 ) is a (statically) secure ORE, (Setup, Prove, Ver) is com¬ 
putationally zero knowledge, and Com is computationally hiding, then (Gen, Enc, Dec, Comp) is a 
statically secure ORE. 

Proof. We will prove security through a sequence of hybrids. Let A be an adversary with advantage 
e in breaking the static security of (Gen, Enc, Dec, Comp). 

Hybrid 0. This is the real experiment, where cr Com(sk), crs Setup(l A ), and the proofs 
77 c f are answered using Prove and valid witnesses. A has advantage e in distinguishing the left and 
right ciphertexts. 

Hybrid 1. This is the same as Hybrid 0, except that crs is generated as (crs, r) iSi(l A ), and 
all proofs are generated using S 2 (crs, r, •). The zero knowledge property of (Setup, Prove, Ver) shows 
that this is indistinguishable from Hybrid 0. 

Hybrid 2. This is the same as Hybrid 1, except that a <— Com(0). Since the randomness for 
computing a is not needed for simulation, this change is undetectable using the hiding of Com. 

Thus the advantage of A in Hybrid 2 is at least e —negl for some negligible function negl. Now 
consider the following adversary cB that attempts to break the security of (Gen 7 , Enc 7 , Dec 7 , Comp 7 ). 
B simulates A , and forwards the message sequences m[ L ^ < < • ■ • < and m[ ri> < rn 2 R> < 

■ ■ ■ < rriq^ produced by A to its own challenger. In response, it receives params 7 , and ciphertexts c 7 , 
where c 7 encrypts either rri- ^ if b = 0 or rri- Ji> if b = 1, for a random bit b chosen by the challenger. 

B now generates a <— Com(O) and (crs, r) 5 i( 1 a ), and lets params = (params 7 , a, crs). It also 
computes 7 r c / 4 — <S 2 (crs, r, x c >), and defines ry = (c 7 ,7r c /), and gives params and the c t to A. Finally 
when A outputs a guess b' for b, B outputs the same guess b'. 

We see that the view of A as a subroutine of B is exactly the same view as in Hybrid 2. Thus, 
b' = b with probability at least e — negl. The security of (Gen 7 , Enc 7 , Dec 7 , Comp 7 ) implies that this 
quantity, and hence e, must be negligible. Thus A must have negligible advantage in breaking the 
security of (Gen, Enc, Dec, Comp). 

□ 
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5 A Separation for Representation Learning 


In this section, we show how to construct a concept class ValidSig that separates efficient repre¬ 
sentation learning from efficient private representation learning, assuming only the existence of 
one-way functions. Here by “representation learning” we mean a restricted form of proper learn¬ 
ing where a learner must output a particular representation (i.e. encoding) of a hypothesis h in 
the concept class C. As with proper learning, this is a natural syntactic restriction to place on 
a learner: for instance, if one wants to learn linear threshold functions (LTF), it makes sense to 
require a learner to produce the actual coefficients of an LTF, rather than an arbitrary circuit that 
happens to compute an LTF. 

The construction is based on the following elegant idea due to Kobbi Nissim [Nisl4], Suppose 
H : D —> R is a cryptographic hash function with the property that given xi,...,x n with y = 
H(xi) = ■ ■ ■ = H(x n ), it is infeasible for an efficient adversary to find another x for which H(x) = y. 
Consider the concept class Hash Point consisting of the concepts 


fx(x') 


1 if H(x) = H{x') 
0 otherwise. 


for every x € R. The representation of a concept f x is the point x. The concept class Hash Point is 
very easy to learn (by representation) without privacy: a learner can identify any positive example 
Xi and output the representation X,. Since H(xi ) = H(x), the concept f Xi is actually equal to the 
target concept f x . On the other hand, a learner that identifies an index x* for which f x * = f x 
cannot be differentially private, since the security of the hash function means it is infeasible to 
produce such an x* that is not present in the sample. 

Note that this argument breaks down if one tries to show that Hash Point is not privately 
properly learnable. While it is infeasible to privately produce a representation x* for which f x * is a 
good hypothesis, the hypothesis h(x) = x{H(x) = h(xij) is equal as a function to every good f x *. 
Moreover, this hypothesis can be constructed privately as long as the sample contains sufficiently 
many positive examples. 

We make this discussion formal by constructing a concept class ValidSig based on super-secure 
digital signature schemes, which can be constructed from one-way functions. Our use of signatures 
to derive hardness results for private proper learning is very analogous to prior hardness results for 
synthetic data generation [DNR+09, UV11]. 

Definition 5.1. A digital signature scheme is a triple of algorithms (Gen, Sign, Ver) where 

• Gen(l A ) produces a key pair (sk, vk). 

• Sign(sk, m) takes the private signing key sk and a message m £ {0,1}* and produces a 
signature a for the message m. 

• Ver(vk, m, a) takes the public verification key vk, a message m, and a signature a, and (de¬ 
terministically) outputs a bit indicating whether cr is a valid signature for m. 

The correctness property of a digital signature scheme is that for every (sk,vk) 4 — R Gen(l A ), every 
message m G {0, 1}*, and every signature a R Sign(sk, m), we have Ver(vk, m, a) = 1. 

Definition 5.2. A digital signature scheme is super-secure under adaptive chosen-plaintext attacks 
if all efficient adversaries A win the following weak forgery game with negligible probability: 
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• The challenger samples (sk, vk) •(— R Gen(l A ). 

• The adversary A is given vk and oracle access to Sign(sk, •). It adaptively queries the signing 
oracle, obtaining a sequence of message-signature pairs A. It then outputs a forgery (m*. a*). 

• The value of the game is 1 iff Ver(vk, m*, a*) = 1 and ( m*,a*) £ A. 

It is known that super-secure digital signature schemes can be constructed from one-way func¬ 
tions [NY89, Rom90, KK05, Gol04], 

We now describe our concept class ValidSig. Let (Gen, Sign, Ver) be a super-secure digital sig¬ 
nature scheme. We define a concept class ValidSig as follows. Fix the message length l. For every 
(vk, m, a) with m G {0,1}^ and Ver(vk, m, a) = 1, define the concept 



For convenience, we also include the all-zeroes hypothesis in ValidSig, with representation Y. 

Theorem 5.3. Let a,j3 > 0. There exists a proper PAC learning algorithm L for the concept 
class ValidSig achieving error a and confidence 1 — fi. Moreover, L is efficient (running in time 
polynomial in the parameters k, l/ct, log(l//3)j. 


Algorithm 3 Learner L for ValidSig 

1. Request examples {((vk^, cr'fij, b\), ... , ((vk(j, m' n , a' n ), b n )} for n = |"log(l /j3)/a\. 

2. Identify an i for which bi = 1 and return the representation (vk(, m(, a[). If no such i exists, 
return Y representing the all-zeroes hypothesis. 


Proof. Fix a target concept / v k,m,<r £ ValidSig^. and a distribution D on examples. Let POS denote 
the set of examples (vk', ml, a') on which / v k,)n,o-(vk / , m', a') = 1. We divide the analysis of the 
learner into three cases based on the weight D places on the sets POS. 

Case 1: V places at least a weight on POS. Then L receives a positive example with probability 
at least 1 — (1 — a) n >1-/3, and is thus able to identify a concept that equals the target concept. 

Case 2: V places less than a weight on POS. If L gets a positive example, then the analysis of 
Case 1 applies. Otherwise, the all-zeroes hypothesis is a-good. 


□ 


We now prove the hardness of properly privately learning ValidSig by constructing an example 
reidentification scheme for properly learning this concept class. Our example reidentification scheme 
yields a hard distribution even when the error parameter a is taken to be inverse-polynomially close 
to 1. 


Theorem 5.4. Let y(n) and f(n) be noticeable functions. Let (Gen, Sign, Ver) be a super-secure 
digital signature scheme. Then there exists an (efficient) (a = 1 — 7, ffi-example reidentification 
scheme (Gen ex , Trace ex ) for representation learning the concept class ValidSig. 
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We now give the proof of Theorem 5.4. 

Proof. We construct an example reidentification scheme for ValidSig as follows. The algorithm Gen ex 
samples (sk,vk) a Gen(l A ), a message rn E {0,1}^, and a signature a «— R Sign(sk, m), yielding 
a concept / v k,m,o-- Let V be the distribution of (vk, m, Sign(sk, m)) for random m <— R {0,1}^. 
Gen ex then samples xq, xi, ... ,x n i.i.d. from V. Given a representation (vk*, m*, a*), the algorithm 
Trace ex simply identifies an index i for which Xi = (vk*, m*, a*), and outputs T if none is found. 

We first verify completeness for this scheme. Let L be a learner for ValidSig using n examples. If 
the representation (vk*, m*, a*) produced by L represents an (1 — 7 )-good hypothesis, then it must 
be the case that vk* = vk and Ver(vk, m*, a*) = 1. Thus, if L violates the completeness condition, 
it can be used to construct the weak forgery adversary A (Figure 4) that succeeds with noticeable 
probability £. 


Algorithm 4 Weak forgery adversary A 

1. Query the signing oracle on random messages rri\ , ... ,m' n i — R {0,1} £ , obtaining signatures 
° 4 ,• • •, ®n- 

2. Run L on the labeled examples ((vk, rn \, <7j), 1),..., ((vk, m! n , a' n ), 1), obtaining a representa¬ 
tion (m*, a*). 

3. Output the forgery ( m*,cr *). 


Now we verify soundness for the scheme. Observe that for any i, the sample S-i contains no 
information about message ruj. Therefore, the learner has a 2~ f = negl(fc) probability at producing 
a representation containing message rrii, proving soundness. 

□ 
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